Descriptive Schema: Semantics-based Query Answering

نویسندگان

  • Sau Dan Lee
  • Patrick Yee
  • Thomas Y. Lee
  • David Wai-Lok Cheung
  • Wenjun Yuan
چکیده

We propose the novel concept of “descriptive schema” (DS). Unlike ordinary database schemas, a DS does not restrict the structure of the underlying database. Rather, it is just a probabilistic description of the structure. When answering keyword queries, DS can be used to improve semantics-based query answering and result ranking. 1 Schema: To have or not to have? Wikipedia is a rich repository of information. However, facilities to exploit the information are still limited. Although typical search WWW search engines such as Google[1] allow users to look for information using keywords, they lack a schema for formulating the queries precisely. Besides hyperlinks among the Wikipedia pages, many pages have Category tags as well as Infoboxes, which can be exploited to perform more sophisticated searches. For example, the DBpedia community makes use of these tags to build a database of RDF triplets, allowing more expressive and precise queries in the form of SPARQL to be used to retrieve useful information [2]. The above are two extremes of search and query. In the former case, the user can perform a search easily using relevant keywords, without having to learn the schema’s lexicon beforehand. In the latter case, a schema can be used to help specify the query more precisely, but it has a non-trivial learning curve. In this paper, we propose the approach of “descriptive schema” to address these shortcomings. We attempt to strike a balance between the ease of use of a schema-less approach and the high accuracy that a schema-based system can bring us. 2 Descriptive Schema In this paper, we propose a new concept called “Descriptive Schema” (DS). Unlike XSD (XML Schema Definition), DS is not meant to prescriptively mandate a structure on the underlying data. We want to retain the flexibility of free format for the pages. Rather, DS, as its name implies, is descriptive. It is only a summary of the structure exhibited by the underlying database. It does not define the structure. The data may occasionally violate the DS. This tolerance to violations marks our biggest innovation, contrasting with existing approaches. Existing approaches to data modelling use “Prescriptive Schema”, which mandates a rigid structure on the underlying data, with little (if any) tolerance to violations. We model a DS by a set of rules on the underlying data. There are many possible ways to formulate the rules. One example rule is: “90% of the time, a page of class ‘Countries’ has value for the field ‘capital’ in the infobox (infobox for countries)”. Note that the rules defined in this way are probabilistic, because they are not satisfied all the time. A DS may thus be considered a summary of the patterns occurring in a database, instead of policies imposed on the data. The task of discovering a DS from a database is a mining task, which is the problem of finding all rules satisfying a the specified syntax and support thresholds, thus following the data mining model in [3].

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimizing Reformulation-based Query Answering in RDF

Reformulation-based query answering is a query processing technique aiming at answering queries under constraints. It consists of reformulating the query based on the constraints, so that evaluating the reformulated query directly against the data (i.e., without considering any more the constraints) produces the correct answer set. In this paper, we consider optimizing reformulation-based query...

متن کامل

Inconsistency-tolerant query answering in ontology-based data access

Ontology-based data access (OBDA) is receiving great attention as a new paradigm for managing information systems through semantic technologies. According to this paradigm, a Description Logic ontology provides an abstract and formal representation of the domain of interest to the information system, and is used as a sophisticated schema for accessing the data and formulating queries over them....

متن کامل

Data exchange: query answering for incomplete data sources

Data exchange is the problem of transforming data structured under a schema, called the source schema, into data structured under another schema, called the target schema. Existing work on data exchange considers settings where the source instance does not contain incomplete information. In this paper we study semantics and address algorithmic issues for data exchange settings where the source ...

متن کامل

New Inconsistency-Tolerant Semantics for Robust Ontology-Based Data Access

In ontology-based data access (OBDA) [17], an ontology provides an abstract and formal representation of the domain of interest, which is used as a virtual schema when formulating queries over the data. Current research in OBDA mostly focuses on ontology specification languages for which conjunctive query answering is first-order (FO) rewritable. In a nutshell, FO-rewritability means that query...

متن کامل

Answering SPARQL queries modulo RDF Schema with paths

SPARQL is the standard query language for RDF graphs. In its strict instantiation, it only offers querying according to the RDF semantics and would thus ignore the semantics of data expressed with respect to (RDF) schemas or (OWL) ontologies. Several extensions to SPARQL have been proposed to query RDF data modulo RDFS, i.e., interpreting the query with RDFS semantics and/or considering externa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008